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Bonacich centrality measures the number of attenuated paths between nodes in a network. We use 
this metric to study network structure, specifically, to rank nodes and find community structure of 
the network. To this end we extend the modularity-maximization method for community detection 
to use this centrality metric as a measure of node connectivity. Bonacich centrality contains a tunable 
parameter that sets the length scale of interactions. By studying how rankings and discovered 
communities change when this parameter is varied allows us to identify globally important nodes 
and structures. We apply the proposed method to several benchmark networks and show that it 
leads to better insight into network structure than earlier methods. 
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Centrality measures the degree to which network struc- 
ture determines importance of a node in a network. Over 
the years many different centrality metrics have been 
studied. Katz [T] recognized that an individual's central- 
ity depends not only on how many others she is connected 
to (her degree), but also on their centrality. He measured 
centrality of a node by the total number of paths linking 
it to other nodes in a network, exponentially weighted by 
the length of the path. Freeman [2 defined betweenness 
centrality as the fraction of all shortest paths between 
pairs of nodes that pass through a given node. Sev- 
eral variants of centrality based on random walks have 
been proposed and analyzed El |H El El |7] . Specifically, 
Bonacich [3], like Katz, measured the total number of 
attenuated paths from a node, but now the attenuation 
factors along direct (from the originating node) and in- 
direct (from intermediate nodes) edges in a path can be 
different. These parameters set the length scale of in- 
teractions. Unlike other centrality metrics, which do not 
distinguish between local and global structure, these pa- 
rameterized centrality metrics can differentiate between 
locally connected nodes, i.e., nodes that are linked to in- 
terconnected nodes, and globally connected nodes, which 
arc linked to and mediate communication between oth- 
erwise unconnected nodes. 

In addition to ranking nodes, Bonacich centrality can 
be used to identify communities within the network. In 
this paper, we generalize modularity maximization-based 
approach [8, 9J to use Bonacich centrality. Rather than 
find regions of the network that have greater than ex- 
pected number of edges connecting nodes |10j . our ap- 
proach looks for regions that have greater than expected 
number of paths between nodes. Arenas et al. [TT] have 
similarly generalized modularity to find correlations be- 
tween nodes beyond nearest neighbors. Their motif- 
based community detection algorithm uses the size of 
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the motif to impose a limit on the proximity of neigh- 
bors. Our method, on the other hand, imposes no such 
limit. The measure of global correlation computed us- 
ing Bonacich centrality is equal to the weighted average 
of correlations for motifs of different sizes. Our method 
enables us to easily calculate this complex term. 

We use Bonacich centrality to study the structure of 
several benchmark networks, as well as the network ex- 
tracted from a social web site. We show that parameter- 
ized centrality can identify locally and globally important 
nodes and structures, leading to a better understanding 
of network structure. 

Bonacich [3] defined a centrality metric Cij(a, /3) as 
the total number of attenuated paths between nodes i 
and j, with f3 and a giving the attenuation factors along 
direct edges (from i) and indirect edges (from intermedi- 
ate nodes) in the path from i to j. Given the adjacency 
matrix of the network A, Bonacich centrality matrix can 
be computed as follows: 

n 

C(a, $) = (3A + p ai A ■ A + ■ ■ ■ +/3 J[ UjA n+1 + ■■■ (1) 

The first term gives the number of paths of length one 
(edges) from i to j, the second the number of paths of 
length two, etc. Although aj along different edges in 
a path could in principle be different, for simplicity, we 
take them all to be equal: aj = a for all j. In this case, 

the series converges to C(a,(3) — f3A(I — a,A)~~ , which 
holds while a < 1/A, where A is the largest characteristic 
root of A [12] . For a = 0, Bonacich centrality reduces to 
the Katz score pQ. 

Bonacich centrality (b-centrality) contains a tunable 
parameter a that sets the length scale of interactions. 
For a = (and (3=1), b-centrality takes into account 
direct edges only and reduces to degree centrality. As a 
increases, C(ev,/3) becomes a more global measure, tak- 
ing into account ever larger network components. The 
expected length of a path, the radius of centrality, is 
(1 — cr) _1 . This tunable parameter turns b-centrality 
into a powerful tool for studying network structure. 
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Following Bonacich, we use C,(a, {3) = J2j Cij(a, 0) as 
the measure of how 'close' node i is to other nodes in a 
network. A node has high centrality if it is connected 
to many highly interconnected nodes, i.e., it is a leader 
within its community. A node can also have high central- 
ity if it is connected to nodes from different communities. 
Such mediators bridge different communities, enabling 
communication between them |13j . We can identify such 
nodes because their b-centrality increases with a. Other 
centrality metrics do not distinguish between locally and 
globally connected nodes. 

Girvan & Newman |10j proposed modularity as a met- 
ric for evaluating community structure of a network. 
The modularity-optimization class of community de- 
tection algorithms [H El E] finds a network division 
that maximizes the modularity Q = (connectivity within 
community)- (expected connectivity), where connectivity 
is density of edges. We extend this definition to use b- 
centrality as the measure of network connectivity |15j . 
Therefore, in the best division of a network, nodes have 
more paths connecting them to nodes within their com- 
munity than to outside nodes. We generalize modularity 
Q as 

Q{a) =Y,[Cii-C ij ]8{s i ,s :j ) (2) 

ij 

where Cy is given by Eq. [l] Cy is the expected b- 
centrality, and Si is the index of the community i belongs 
to, with 5(si,Sj) = 1 if Si = Sj] otherwise, 8(si,Sj) = 0. 
We round the values of Cy to the nearest integer. Since 
P factors out of modularity, we consider dependence on 
a only. 

To compute expected centrality, we consider a graph, 
referred to as the null model, which has the same num- 
ber of nodes and edges as the original graph, but in 
which the edges are placed at random. To make the 
derivation below more intuitive, instead of b-centrality 
we talk of the number of attenuated paths. When all 
the nodes are placed in a single group, then axiomati- 
cally, Q — 0. Therefore YjiAQj — Cij] — 0, and we set 
W = J2ij Cij — J2ij Cij- Therefore, according to the ar- 
gument above, the total number of paths between nodes 
in the null model J^ij Cij is equal to the total number of 
paths in the original graph, J2ij Cij- We further restrict 
the choice of null model to one where the expected num- 
ber of paths reaching node j, W™, is equal to the actual 
number of paths reaching the corresponding node in the 
original graph. W™ 1 = £\ Cy = J2i C ij • Similarly, we 
also assume that in the null model, the expected num- 
ber of paths originating at node i, W° ut , is equal to the 
actual number of paths originating at the corresponding 
node in the original graph W° ut = Ylj Cij = Xy Cij ■ 
Next, we reduce the original graph G to a new graph 
C that has the same number of nodes as G and total 
number of edges W, such that each edge has weight 1 
and the number of edges between nodes i and j in G' is 
Cij. Now the expected number of paths between i and 



j in graph G could be taken as the expected number of 
the edges between nodes i and j in graph G', and the 
actual number of paths between nodes i and j in graph 
G can be taken as the actual number of edges between 
node i and node j in graph G' . The equivalent random 
graph G" is used to find the expected number of edges 
from node i to node j . In this graph the edges are placed 
in random subject to constraints: (i) The total number 
of edges in G" is W; (ii) The out-degree of node i in G" 
— out-degree of node i in G' — W° ut ; (Hi) The in-degree 
of a node j in graph G" =in-degree of node j in graph 
qi = W jn_ xhus in qii thc probability that an edge will 
emanate from a particular node depends only on the out- 
dcgrcc of that node; the probability that an edge is inci- 
dent on a particular node depends only on the in-degree 
of that node; and the probabilities of the two nodes be- 
ing the two ends of a single edge are independent of each 
other. In this case, the probability that an edge exists 
from i to j is given by C(emanates from i) ■ C(incident 
on j)=(W° ut /W)(W™ /W). Since the total number of 
edges is W in G" , therefore the expected number of edges 
between i and j is W ■ (W° ut /W){Wj n /W) = Cy, the 
expected the expected b-centrality in G. 

Once we compute Q(a), we have to select an algorithm 
to divide the network into communities that optimize 
Q(a). Brandes et al. [TB] have shown that the decision 
version of modularity maximization is NP-complete. Like 
others [91 [17], we use the leading eigenvector method to 
obtain an approximate solution. In this method, nodes 
are assigned to either of two groups based on a single 
eigenvector corresponding to the largest positive eigen- 
value of thc modularity matrix. This process is repeated 
for each group until modularity does not increase further 
upon division. 

We apply the formalism developed above to benchmark 
networks studied in literature, and a network extracted 
from the social photosharing site Flickr. We use purity 
to evaluate the quality of discovered communities. We 
define purity as the fraction of all pairs of objects in the 
same community that are assigned to the same group by 
the algorithm. This is a simplified version of the Wallace 
criterion |18j for evaluating performance of clustering al- 
gorithms. 

First, we study the friendship network of Zachary's 
karate club [19] • During the course of the study, a dis- 
agreement developed between the administrator and the 
club's instructor, resulting in the division of the club into 
two factions, represented by circles and squares in Fig- 
ure [lja) . We find community division of this network for 
< a < 0.29 (maximum a is given by reciprocal of the 
largest eigenvalue of thc adjacency matrix). 

The first bisection of the network results in two com- 
munities, regardless of the value of a, which are identi- 
cal to the two factions observed by Zachary. However, 
when the algorithm runs to termination (no more bisec- 
tions are possible), different communities are found for 
different values of a. For a = 0, the method reduces 
to edge-based modularity maximization |14j and leads to 
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Nodes 15,19,23 
Nodes 25,26 
Node 17 



(b) 

FIG. 1: Zachary's karate club data, (a) Circles and squares 
represent the two actual factions, while colors stand for dis- 
covered communities for a — 0. (b) Centrality of club mem- 
bers vs. a 



four communities (Figure [lja)). For < a < 0.14 it 
discovers three communities, and for 0.14 < a < 0.29, 
two communities that arc identical to the factions found 
by Zachary. As Table [I] shows, the purity of discovered 
communities increases with a. 



TABLE I: The number and purity of communities discovered 
at different values of a 
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Figure [TJb) shows how b-centrality changes with a. 
Nodes 34 and 1 have the highest centrality for all values 
of a. It was the disagreement between these leaders, the 
club administrator (node 1) and instructor (34), that led 
to the club's division. Nodes 33 and 2 also have high 



centrality and hold leadership positions. All these nodes 
are scored highly by betweenness centrality (BC) [2] and 
PageRank (PR) [5]. Centrality of nodes 3, 14, 9, 31, 20, 8 
increases with a from moderate to relatively high values. 
All of them (except 8) are connected to both communi- 
ties: these are the mediators. BC scores of these nodes 
are low, but non-zero. Nodes 25, 26 and 17 have low 
centrality which decreases with a. These are peripheral 
members. BC of 17 is zero, as expected, but 25 and 26 
have scores similar to 31. PR scores of these peripheral 
nodes are higher than nodes 21, 22, 23 that are connected 
to central nodes, and comparable to scores of mediator 
nodes 20 and 31. While both BC and PR correctly pick 
out leaders, they do not distinguish between peripheral 
members and mediators. 

We also studied the US College football dataset [2D] 
and the political books data[25 . The first network rep- 
resents the schedule of Division 1 games for the 2001 
season where the nodes represent teams and the edges 
represent the regular season games between teams. The 
teams are divided into conferences containing 8 to 12 
teams each. Games are more frequent between members 
of the same conference, thought inter-conference games 
also take place. This leads to an intuition, that the nat- 
ural communities may be larger than conferences. The 
political books network represents books about US pol- 
itics sold by the online bookseller Amazon. Edges rep- 
resent frequent co-purchasing by the same buyers, as in- 
dicated by the "customers who bought this book also 
bought these other books" feature of Amazon. The nodes 
were labeled liberal, neutral, or conservative by Mark 
Newman on a reading their descriptions and reviews on 
Amazon |26j. The number and purity of the communities 
found in these networks for various values of a are shown 
in Table [I] a = case corresponds to edge-based modu- 
larity method. As a increases, the number of groups goes 
down, while their purity increases. We were not able to 
evaluate rankings due to the lack of gold standard for 
these datasets. 

In addition to benchmark networks, we also studied a 
social network retrieved from the social photosharing site 
Flickr. We sampled Flickr's social network by identify- 
ing roughly 2000 users interested in one of three topics: 
portraiture, wildlife, and technology. Further, we iden- 
tified four (eight wildlife) users who were interested in 
each topic by studying their profiles, specifically group 
membership and user's tags. We then used Flickr API to 
retrieve these users' contacts, as well as their contacts' 
contacts, and labeled all by the topic through which they 
were discovered. 

We reduced the network to an undirected network of 
mutual contacts only, resulting in a network of 5747 
users, with 1620, 1337 and 2790 users labeled technol- 
ogy, portraiture and wildlife respectively. Although we 
did not verify that all the users were interested in the 
topics they were labeled with, we use these 'soft' labels 
to evaluate the discovered communities. For a = 0, we 
found four groups, while for higher values of a (a < 0.01), 
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we found three groups. As shown in Table [T] the purity 
of discovered communities increases steadily with a. 

We can generalize the centrality metric presented 
above into a notion of path-based connectivity and re- 
late it to other centrality metrics. Let qfj be the number 
of paths of length n connecting nodes i and j. Number 
of paths of length one connecting i and j is qjj = Ajj, 
paths of length two is qfj = (Ax A)ij, etc. The expected 
number of paths connecting two nodes is: 

v , . (W 1 -q? j +W 2 .qg + ... + W n .q!j + ...) 
E(qy) - m ~ Wi 

This value can be used to find out how connected two 
nodes are. Note that Wi can be a scalar or a vector. 

Several path-based centrality metrics can be expressed 
in terms of E(qij), including random walk models [5, 
[5TJ HH 122, Katz score, as well as Bonacich central- 
ity. In random walk models, a particle starts a random 
walk at node i, and iteratively transitions to its neighbors 
with probability proportional to the corresponding edge 
weights. At each step, the particle returns to i with some 
restart probability (1 — c). The proximity score is defined 
as the steady-state probability j that the particle will 
reach node j [22] , 

• If Wk — c k ■ _D~( fe ) where c is a constant and D is 
aiinxji matrix with = X)j=i A*j if £ = j and 
otherwise; then, E(qij) reduces to proximity score 
in random walk model j^U [22] • 

• If Wi = Uj =1 aj, where the scalar ctj is the atten- 
uation factor along the j-th link in the path, then 



E(qij) reduces to Bonacich centrality. For ease of 
computation, we have taken a± = f3 and ai = a 
Vi + 1. 

• When f3 — a, this in turn reduces to the Katz status 
score pQ. 

• When ai — 1 and = . . . = a n = . . . = 0, then 
E(qij) is the degree centrality used in modularity- 
maximization approaches [8]. 

In summary, we used Bonacich centrality to study the 
structure of networks, specifically, identify communities 
and important nodes in the network. We extended the 
modularity maximization class of algorithms to use b- 
ccntrality, rather than edges, as a measure of network 
connectivity We applied this approach to benchmark 
networks studied in literature and found that it results 
in network division in close agreement with the ground 
truth. We also used b-centrality to rank nodes in a net- 
work. By studying changes in rankings that occur when 
parameter a is varied, we were able to identify locally im- 
portant 'leaders' and globally important 'mediators' that 
facilitate communication between different communities. 
We can easily extend this definition to multi-modal net- 
works that link entities of different types, and use ap- 
proach described in this paper to study the structure of 
complex networks |24j . 
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